566 research outputs found

    Distributed learning of CNNs on heterogeneous CPU/GPU architectures

    Get PDF
    Convolutional Neural Networks (CNNs) have shown to be powerful classification tools in tasks that range from check reading to medical diagnosis, reaching close to human perception, and in some cases surpassing it. However, the problems to solve are becoming larger and more complex, which translates to larger CNNs, leading to longer training times that not even the adoption of Graphics Processing Units (GPUs) could keep up to. This problem is partially solved by using more processing units and distributed training methods that are offered by several frameworks dedicated to neural network training. However, these techniques do not take full advantage of the possible parallelization offered by CNNs and the cooperative use of heterogeneous devices with different processing capabilities, clock speeds, memory size, among others. This paper presents a new method for the parallel training of CNNs that can be considered as a particular instantiation of model parallelism, where only the convolutional layer is distributed. In fact, the convolutions processed during training (forward and backward propagation included) represent from 6060-9090\% of global processing time. The paper analyzes the influence of network size, bandwidth, batch size, number of devices, including their processing capabilities, and other parameters. Results show that this technique is capable of diminishing the training time without affecting the classification performance for both CPUs and GPUs. For the CIFAR-10 dataset, using a CNN with two convolutional layers, and 500500 and 15001500 kernels, respectively, best speedups achieve 3.28×3.28\times using four CPUs and 2.45×2.45\times with three GPUs. Modern imaging datasets, larger and more complex than CIFAR-10 will certainly require more than 6060-9090\% of processing time calculating convolutions, and speedups will tend to increase accordingly

    Pragma-Oriented Parallelization of the Direct Sparse Odometry SLAM Algorithm

    Get PDF
    Monocular 3D reconstruction is a challenging computer vision task that becomes even more stimulating when we aim at real-time performance. One way to obtain 3D reconstruction maps is through the use of Simultaneous Localization and Mapping (SLAM), a recurrent engineering problem, mainly in the area of robotics. It consists of building and updating a consistent map of the unknown environment and, simultaneously, saving the pose of the robot, or the camera, at every given time instant. A variety of algorithms has been proposed to address this problem, namely the Large Scale Direct Monocular SLAM (LSD-SLAM), ORB-SLAM, Direct Sparse Odometry (DSO) or Parallel Tracking and Mapping (PTAM), among others. However, despite the fact that these algorithms provide good results, they are computationally intensive. Hence, in this paper, we propose a modified version of DSO SLAM, which implements code parallelization techniques using OpenMP, an API for introducing parallelism in C, C++ and Fortran programs, that supports multi-platform shared memory multi-processing programming. With this approach we propose multiple directive-based code modifications, in order to make the SLAM algorithm execute considerably faster. The performance of the proposed solution was evaluated on standard datasets and provides speedups above 40% without significant extra parallel programming effort.info:eu-repo/semantics/publishedVersio

    On the Evaluation of Energy-Efficient Deep Learning Using Stacked Autoencoders on Mobile GPUs

    Get PDF
    Over the last years, deep learning architectures have gained attention by winning important international detection and classification challenges. However, due to high levels of energy consumption, the need to use low-power devices at acceptable throughput performance is higher than ever. This paper tries to solve this problem by introducing energy efficient deep learning based on local training and using low-power mobile GPU parallel architectures, all conveniently supported by the same high-level description of the deep network. Also, it proposes to discover the maximum dimensions that a particular type of deep learning architecture—the stacked autoencoder—can support by finding the hardware limitations of a representative group of mobile GPUs and platforms.info:eu-repo/semantics/publishedVersio

    Observation of the doubly charmed baryon decay Ξcc++→Ξc′+π+

    Get PDF
    The Ξcc++→Ξc′+π+ decay is observed using proton-proton collisions collected by the LHCb experiment at a centre-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 5.4 fb−1. The Ξcc++→Ξc′+π+ decay is reconstructed partially, where the photon from the Ξc′+→Ξc+γ decay is not reconstructed and the pK−π+ final state of the Ξc+ baryon is employed. The Ξcc++→Ξc′+π+branching fraction relative to that of the Ξcc++→Ξc+π+ decay is measured to be 1.41 ± 0.17 ± 0.10, where the first uncertainty is statistical and the second systematic. [Figure not available: see fulltext.

    Test of lepton universality in bs+b \rightarrow s \ell^+ \ell^- decays

    Get PDF
    The first simultaneous test of muon-electron universality using B+K++B^{+}\rightarrow K^{+}\ell^{+}\ell^{-} and B0K0+B^{0}\rightarrow K^{*0}\ell^{+}\ell^{-} decays is performed, in two ranges of the dilepton invariant-mass squared, q2q^{2}. The analysis uses beauty mesons produced in proton-proton collisions collected with the LHCb detector between 2011 and 2018, corresponding to an integrated luminosity of 9 fb1\mathrm{fb}^{-1}. Each of the four lepton universality measurements reported is either the first in the given q2q^{2} interval or supersedes previous LHCb measurements. The results are compatible with the predictions of the Standard Model.Comment: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-046.html (LHCb public pages

    Study of charmonium and charmonium-like contributions in B+ → J/ψηK+ decays

    Get PDF
    A study of B+→ J/ψηK+ decays, followed by J/ψ → μ+μ− and η → γγ, is performed using a dataset collected with the LHCb detector in proton-proton collisions at centre-of-mass energies of 7, 8 and 13 TeV, corresponding to an integrated luminosity of 9 fb−1. The J/ψη mass spectrum is investigated for contributions from charmonia and charmonium-like states. Evidence is found for the B+→ (ψ2(3823) → J/ψη)K+ and B+→ (ψ(4040) → J/ψη)K+ decays with significance of 3.4 and 4.7 standard deviations, respectively. This constitutes the first evidence for the ψ2(3823) → J/ψη decay

    Second asymptomatic carotid surgery trial (ACST-2): a randomised comparison of carotid artery stenting versus carotid endarterectomy

    Get PDF
    Background: Among asymptomatic patients with severe carotid artery stenosis but no recent stroke or transient cerebral ischaemia, either carotid artery stenting (CAS) or carotid endarterectomy (CEA) can restore patency and reduce long-term stroke risks. However, from recent national registry data, each option causes about 1% procedural risk of disabling stroke or death. Comparison of their long-term protective effects requires large-scale randomised evidence. Methods: ACST-2 is an international multicentre randomised trial of CAS versus CEA among asymptomatic patients with severe stenosis thought to require intervention, interpreted with all other relevant trials. Patients were eligible if they had severe unilateral or bilateral carotid artery stenosis and both doctor and patient agreed that a carotid procedure should be undertaken, but they were substantially uncertain which one to choose. Patients were randomly allocated to CAS or CEA and followed up at 1 month and then annually, for a mean 5 years. Procedural events were those within 30 days of the intervention. Intention-to-treat analyses are provided. Analyses including procedural hazards use tabular methods. Analyses and meta-analyses of non-procedural strokes use Kaplan-Meier and log-rank methods. The trial is registered with the ISRCTN registry, ISRCTN21144362. Findings: Between Jan 15, 2008, and Dec 31, 2020, 3625 patients in 130 centres were randomly allocated, 1811 to CAS and 1814 to CEA, with good compliance, good medical therapy and a mean 5 years of follow-up. Overall, 1% had disabling stroke or death procedurally (15 allocated to CAS and 18 to CEA) and 2% had non-disabling procedural stroke (48 allocated to CAS and 29 to CEA). Kaplan-Meier estimates of 5-year non-procedural stroke were 2·5% in each group for fatal or disabling stroke, and 5·3% with CAS versus 4·5% with CEA for any stroke (rate ratio [RR] 1·16, 95% CI 0·86–1·57; p=0·33). Combining RRs for any non-procedural stroke in all CAS versus CEA trials, the RR was similar in symptomatic and asymptomatic patients (overall RR 1·11, 95% CI 0·91–1·32; p=0·21). Interpretation: Serious complications are similarly uncommon after competent CAS and CEA, and the long-term effects of these two carotid artery procedures on fatal or disabling stroke are comparable. Funding: UK Medical Research Council and Health Technology Assessment Programme

    Observation of the Decay Λ0b→Λ+cτ−¯ν

    Get PDF
    The first observation of the semileptonic b-baryon decay Λb0→Λc+τ-ν¯τ, with a significance of 6.1σ, is reported using a data sample corresponding to 3 fb-1 of integrated luminosity, collected by the LHCb experiment at center-of-mass energies of 7 and 8 TeV at the LHC. The τ- lepton is reconstructed in the hadronic decay to three charged pions. The ratio K=B(Λb0→Λc+τ-ν¯τ)/B(Λb0→Λc+π-π+π-) is measured to be 2.46±0.27±0.40, where the first uncertainty is statistical and the second systematic. The branching fraction B(Λb0→Λc+τ-ν¯τ)=(1.50±0.16±0.25±0.23)% is obtained, where the third uncertainty is from the external branching fraction of the normalization channel Λb0→Λc+π-π+π-. The ratio of semileptonic branching fractions R(Λc+)B(Λb0→Λc+τ-ν¯τ)/B(Λb0→Λc+μ-ν¯μ) is derived to be 0.242±0.026±0.040±0.059, where the external branching fraction uncertainty from the channel Λb0→Λc+μ-ν¯μ contributes to the last term. This result is in agreement with the standard model prediction

    Precision measurement of CP\it{CP} violation in the penguin-mediated decay Bs0ϕϕB_s^{0}\rightarrow\phi\phi

    Get PDF
    A flavor-tagged time-dependent angular analysis of the decay Bs0ϕϕB_s^{0}\rightarrow\phi\phi is performed using pppp collision data collected by the LHCb experiment at % at s=13\sqrt{s}=13 TeV, the center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 6 fb^{-1}. The CP\it{CP}-violating phase and direct CP\it{CP}-violation parameter are measured to be ϕssˉs=0.042±0.075±0.009\phi_{s\bar{s}s} = -0.042 \pm 0.075 \pm 0.009 rad and λ=1.004±0.030±0.009|\lambda|=1.004\pm 0.030 \pm 0.009 , respectively, assuming the same values for all polarization states of the ϕϕ\phi\phi system. In these results, the first uncertainties are statistical and the second systematic. These parameters are also determined separately for each polarization state, showing no evidence for polarization dependence. The results are combined with previous LHCb measurements using pppp collisions at center-of-mass energies of 7 and 8 TeV, yielding ϕssˉs=0.074±0.069\phi_{s\bar{s}s} = -0.074 \pm 0.069 rad and lambda=1.009±0.030|lambda|=1.009 \pm 0.030. This is the most precise study of time-dependent CP\it{CP} violation in a penguin-dominated BB meson decay. The results are consistent with CP\it{CP} symmetry and with the Standard Model predictions.Comment: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-001.html (LHCb public pages

    Observation of Cabibbo-suppressed two-body hadronic decays and precision mass measurement of the Ωc0\Omega_{c}^{0} baryon

    Full text link
    The first observation of the singly Cabibbo-suppressed Ωc0ΩK+\Omega_{c}^{0}\to\Omega^{-}K^{+} and Ωc0Ξπ+\Omega_{c}^{0}\to\Xi^{-}\pi^{+} decays is reported, using proton-proton collision data at a centre-of-mass energy of 13TeV13\,{\rm TeV}, corresponding to an integrated luminosity of 5.4fb15.4\,{\rm fb}^{-1}, collected with the LHCb detector between 2016 and 2018. The branching fraction ratios are measured to be B(Ωc0ΩK+)B(Ωc0Ωπ+)=0.0608±0.0051(stat)±0.0040(syst)\frac{\mathcal{B}(\Omega_{c}^{0}\to\Omega^{-}K^{+})}{\mathcal{B}(\Omega_{c}^{0}\to\Omega^{-}\pi^{+})}=0.0608\pm0.0051({\rm stat})\pm 0.0040({\rm syst}), B(Ωc0Ξπ+)B(Ωc0Ωπ+)=0.1581±0.0087(stat)±0.0043(syst)±0.0016(ext)\frac{\mathcal{B}(\Omega_{c}^{0}\to\Xi^{-}\pi^{+})}{\mathcal{B}(\Omega_{c}^{0}\to\Omega^{-}\pi^{+})}=0.1581\pm0.0087({\rm stat})\pm0.0043({\rm syst})\pm0.0016({\rm ext}). In addition, using the Ωc0Ωπ+\Omega_{c}^{0}\to\Omega^{-}\pi^{+} decay channel, the Ωc0\Omega_{c}^{0} baryon mass is measured to be M(Ωc0)=2695.28±0.07(stat)±0.27(syst)±0.30(ext)MeV/c2M(\Omega_{c}^{0})=2695.28\pm0.07({\rm stat})\pm0.27({\rm syst})\pm0.30({\rm ext})\,{\rm MeV}/c^{2}, improving the precision of the previous world average by a factor of four.Comment: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-011.html (LHCb public pages
    corecore